257 research outputs found
Electrical Flows, Laplacian Systems, and Faster Approximation of Maximum Flow in Undirected Graphs
We introduce a new approach to computing an approximately maximum s-t flow in
a capacitated, undirected graph. This flow is computed by solving a sequence of
electrical flow problems. Each electrical flow is given by the solution of a
system of linear equations in a Laplacian matrix, and thus may be approximately
computed in nearly-linear time.
Using this approach, we develop the fastest known algorithm for computing
approximately maximum s-t flows. For a graph having n vertices and m edges, our
algorithm computes a (1-\epsilon)-approximately maximum s-t flow in time
\tilde{O}(mn^{1/3} \epsilon^{-11/3}). A dual version of our approach computes a
(1+\epsilon)-approximately minimum s-t cut in time
\tilde{O}(m+n^{4/3}\eps^{-8/3}), which is the fastest known algorithm for this
problem as well. Previously, the best dependence on m and n was achieved by the
algorithm of Goldberg and Rao (J. ACM 1998), which can be used to compute
approximately maximum s-t flows in time \tilde{O}(m\sqrt{n}\epsilon^{-1}), and
approximately minimum s-t cuts in time \tilde{O}(m+n^{3/2}\epsilon^{-3})
Deep reinforcement learning from human preferences
For sophisticated reinforcement learning (RL) systems to interact usefully
with real-world environments, we need to communicate complex goals to these
systems. In this work, we explore goals defined in terms of (non-expert) human
preferences between pairs of trajectory segments. We show that this approach
can effectively solve complex RL tasks without access to the reward function,
including Atari games and simulated robot locomotion, while providing feedback
on less than one percent of our agent's interactions with the environment. This
reduces the cost of human oversight far enough that it can be practically
applied to state-of-the-art RL systems. To demonstrate the flexibility of our
approach, we show that we can successfully train complex novel behaviors with
about an hour of human time. These behaviors and environments are considerably
more complex than any that have been previously learned from human feedback
A Cryptographic Test of Quantumness and Certifiable Randomness from a Single Quantum Device
We give a protocol for producing certifiable randomness from a single untrusted quantum device that is polynomial-time bounded. The randomness is certified to be statistically close to uniform from the point of view of any computationally unbounded quantum adversary, that may share entanglement with the quantum device. The protocol relies on the existence of post-quantum secure trapdoor claw-free functions, and introduces a new primitive for constraining the power of an untrusted quantum device. We then show how to construct this primitive based on the hardness of the learning with errors (LWE) problem. The randomness protocol can also be used as the basis for an efficiently verifiable "quantum supremacy" proposal, thus answering an outstanding challenge in the field
- …